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INTRODUCTION 


“Genomics,” the study of the genome of a species, is 
the buzz word of the twenty-first century, thanks to the 
Human Genome project that ushered in this new era of 
biology (Baltimore, 2001). The genomic tools are simple, 
straightforward and more accurate, thus making biol- 
ogy a more exact science, similar to physics and chem- 
istry. Nonetheless, the purpose of the studies and the 
design of the experiments become more critical in such 
a venture due to the enormous diversity and complexity 
of the biological phenomena that operate in evolutionary 
processes. The Indian subcontinent, the second success- 
ful home of humankind, is special in this evolutionary 
process due to her population’s long history, many migra- 
tions, isolation, divergence, and cultural evolution since 
the first emigration of man (Wells et al., 2001). The impact 
of natural selection that has operated on these disparate 
gene pools in an alien environment is a matter of intense 
scrutiny, since no parallel for such longstanding and sym- 
patrically isolated populations exists in other parts of the 
world, apart from the birthplace of mankind in Africa. 
Many of these isolations seem to have occurred prior to 
language developments. The geographical subsistence and 
cultural isolations presumably lead to different language 
developments in various parts of India. We attempt here 
to interpret the Non Recombinant Y (NRY) chromosome 
polymorphisms of India in the context of migrations and 
origin of languages. 

In 1901, Karl Landsteiner, the discoverer of the human 
ABO blood group and Nobel laureate, first provided direct 
evidence for the existence of genomic diversity in human 
populations. In 1919, Hirszfeld and Hirszfeld found 
ABO gene variations among human populations. The 
B blood group was unique and most prevalent in South 
Asia, particularly in southern Indian tribal populations 
(Cavalli-Sforza et al., 1994). During the 1950s and 1960s, 
more systemic analysis of variation in genes and proteins 


became possible with the detection by Pauling et al. 
(1949) of blood protein polymorphisms in hemoglobin. 
The 1980s were a transitional period from the analysis of 
gene polymorphisms to protein polymorphism (Sanghvi 
et al., 1981), to the studies of DNA sequence polymor- 
phisms in the form of Human Genome and other vari- 
ome projects (Baltimore, 2001). To better understand the 
origin of this genomic diversity, one may need to study 
population-level forces such as migration and miscegena- 
tion, which play major roles in creating diversity. It has 
been proposed that genomic differentiation in popula- 
tions is mostly due to “fission” followed by independent 
evolution (Cavalli-Sforza, 1997). Mutations, natural selec- 
tion, and drift play important roles in deciphering diver- 
sity at the population level. While mutations supply raw 
material for genomic diversity by introducing new alleles, 
their survival and expansion is dependent on their fitness 
and functional importance. The study of polymorphism 
at the single nucleotide (SNP) level in introns (noncoding 
region) or exons (coding region), or at the microsatellite 
level, becomes a powerful tool in studying genomic diver- 
sity in both health and disease. Recent literature on NRY 
chromosomes makes them ideal candidates to study pop- 
ulation diversity. NRY is evolutionarily a neutral marker, 
thus permitting us to reconstruct a population’s history. 
The distribution of NRY variations of various linguistic 
states of India becomes more interesting. The analyses 
throw better light on the population migrations, language 
development, and its spread. 


RECENT AFRICAN ORIGINS 


The recent African origin and spread of anatomically 
modern humans suggested that Homo sapiens sapiens, 
our species, evolved from a small African population 
that had subsequently colonized the whole world, sup- 
planting former hominids, ~120-200 thousand years 
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ago (kya) around the time of the first appearance of 
anatomically modern humans (Cann et al., 1987). This 
replacement model, now widely accepted, has been 
later called the “Out of Africa,” or “Recent African 
Origin” (RAO) model, in contrast to the earlier Multi- 
Regional Evolution Model (MRE; see Wolpoff et al., 
1984). Molecular evidence favors the RAO model. Older 
populations evolved for longer must have had more time 
to accumulate genomic diversity. The excess African 
diversity can thus be explained by older onset of popula- 
tion demographic expansion in Africa, combined with 
higher effective population size, population size fluctua- 
tions, and also periodic extinctions of populations out- 
side Africa or positive selection through adaptation to 
new environments outside Africa (Eller, 2001; Aquadro 
et al., 2001). The non-African patterns of genetic varia- 
tion are indeed a subset of African ones. Microsatellite 
studies also showed a gradual reduction of diversity with 
increasing distance from Africa, and linkage disequilib- 
rium values, which reflect the lower ages of haplotypes 
in non-African populations (Tishkoff et al., 1996). The 
Indian subcontinent, the second to be occupied by man, 
thus attracts our attention to investigate further in these 
directions. The RAO model proposes one, two, or multi- 
ple migrations using various routes over a period of time. 
Two routes have been proposed: the first is the “north- 
ern route” over Sinai, leading to eastern Asia through 
the steppes of central Asia and southern Siberia, and the 
second is the “southern route” over southern Arabia, fol- 
lowed by migration along the coastline of India. While 
the northern route model could explain the peopling of 
the whole of Eurasia by a single migration from Africa, 
the southern route model is interpreted as implying at 
least two separate late Pleistocene dispersal events, one 
leading to the northwest and the other to the east of 
Eurasia (Cavalli-Sforza et al., 1994). 


INDIAN CORRIDOR 


Being positioned at the tri-junction of African, northern 
Eurasian and oriental realms, India has served as a major 
corridor for the dispersal of modern humans (Cann, 2001) 
and attracted many streams of people since the Paleolithic, 
starting with the Late Pleistocene as supported by archae- 
ological evidence (Paddaya, 1982; Misra, 2001; Petraglia 
et al , 2010). Though the modern anthropology tends to 
reject the somatoscopic and anthropological measure- 
ments, there is a revival of interest in deciphering skin 
color genes and studying their genome with modern tools 
(Yuldasheva et al., 2002). Sanghvi and Karve, distinguish- 
ing various castes of Tamil Nadu, India, have deciphered 
that nose shape and skin color are the most discrimina- 
tive (Sanghvi et al., 1981). Even today, the Indian physi- 
cal anthropologists consider these ancient classifications, 
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identifying four different morphological groups in India 
(Bhasin, 2006). These are: (i) “negritos,” characterized by 
dwarf stature and frizzy hair, who are common in Nilgiri 
hills of Tamil Nadu (Paniya, Irula, and Kadar tribes) and 
the Andaman islands: we see them nowadays in many caste 
populations, including Brahmins; (ii) “proto-Austroloids,” 
characterized by long head, dark skin, and broad nose, 
found in central and southern India and speaking 
Dravidian languages/dialects; (iii) “Mongoloids,” char- 
acterized by broad face, medium stature, yellow skin, 
and slightly obliquely set eyes, exclusively found in sub- 
Himalayan and northeastern regions, speaking Austro- 
Asiatic (AA) or Tibeto-Burman (TB) languages; and (iv) 
“Dinaric” type (Mediterranean element) with medium to 
light pigment, hook nose, acrocephalic and round heads, 
found in Bengal and Orissa. The “Caucasoids” or the 
“Nordic”, with blond hair and long heads and speaking 
Indo-European (IE) languages is most common in the 
north and northwestern regions of India. The four major 
language families of India seem to have their own non- 
overlapping geographic clines. It will be interesting to 
compare the distribution of the NRY markers and the ori- 
gin of these languages, and answer whether these could 
have arisen through fission and a long process of isolation 
in various regions of India. 


NRY PHYLOGENY IN INDIA 
AFRICAN ROOT 


The roots of Y phylogeny roots in Africa have been dated 
around 100 kya (Underhill, 2003), characterized by HG 
A-M91 and HG B-M60 NRY-SNP haplogroups (HGs), 
and restricted to Africa. These migrations and subse- 
quent mutations formed the scaffold on which all other 
Y- chromosome diversification with geographical cline 
has occurred. The majority of Y lineages across the globe 
are composed of a tripartite assemblage consisting of 
(1) HG C-M130, (2) HG D-M174 and HG E-M96, and 
(3) overarching HG F-M89, which defines the internal 
node of all remaining HGs, G-M201 through R-M207 
(Underhill et al., 2001; Wells, 2007). 


OUT OF AFRICA EMIGRATIONS 


The HG C-M130, not seen in any African populations 
presumably originated somewhere in Asia on an M168 
lineage sometime after an early departure event (Capelli 
et al., 2001; Underhill et al., 2001; Table 74-1). This clade 
(C-M130) characterizes the first migrants into India: the 
descendants we could identify near Madurai (Wells et al., 
2001). This clade has many sublineages displaying irregu- 
lar geographic patterning consistent with diversification 
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and northward migration of this HG C-M130, since the 
last ice age, with the westernmost limit in India: thus HG 
C3-M217, a transversion mutation, is common in East 
Asia and Siberia, with representatives in North America 
(Karafet et al., 2001; Lell et al., 2002), and eastern and 
central parts of central Asia, while C5-M356 is common 
in India (Sengupta et al., 2006). This lineage is absent in 
Indonesia, Oceania (Kayser et al., 2000), and Yunnan, 
China (Karafet et al., 2001). 

The ancestors who accumulated HG D-M174 and HG 
E-M96 mutations could have arisen in Africa or Asia 
(Underhill, 2003). HG E-M96 lineages are the most fre- 
quent in Africa, and display subsequent binary and mic- 
rosatellite diversification. Conversely, Asian haplogroup 
D-M174 occurs at low frequencies throughout eastern 
Asia, except in remote and isolated locations like Tibet, 
Japan, and the Andaman islands (Underhill et al., 2001; 
Thangaraj et al., 2003). 

The third major and most successful subclade of M168 
lineages, characterized by super-haplogroup F-M89, 
defines the root from which all others (HGs G-M201 


through R-M207) originated and have evolved outside 
Africa (Kivisild et al., 2003). HG F-M89 diversified into 
many branches with region-specific markers—the Middle 
East showing HGs G-M201 and J-M304, Europe with HG 
I-M170, and India with F-M89 and H-M69 lineages, sel- 
dom observed elsewhere (Table 74-1). 


EXPANSION IN INDIA 


HG F*-M89" is the most paraphyletic subcluster (unclas- 
sified derivative) of M168 lineages, ubiquitous but found 
with lesser frequency in various parts of India. Many 
tribal populations of southern India possess higher fre- 
quencies of F*-M89* with high STR variance, particularly 
Dravidian speaking groups of Tamil Nadu and Koya of 
Orissa (Kavitha, 2008; Wells et al., 2001; Kivisild et al., 
2003; Cordaux et al., 2004a; Table 74-2). The high STR 
variance of HG F*-M89* from Tamil Nadu and Andhra 
Pradesh has suggested a deep time depth of 45,000 YBP 
(Sengupta et al., 2006). 


TABLE 74-1 THE AGES OF THE NRY HAPLOGROUPS AND THEIR DISCRIMINATING ALLELES 


PREVALENT IN INDIA AND NEARBY REGIONS. 


NRY HG Marker Estimated Age 


of the mutation 


Distribution 


Reference 


YBP* 
Cc M130 50,000 India, Australia, Central Asia America, | Genographic* 
F M89 45,000 India Genographic# 
G M201 30,000 Genographic# 
H M69 20,000-30,000 Genographic# 
Hi M52 25,000 India Genographic# 
J M304 31,700 Middle East Semino et al. (2004) 
J2 M172 15,000-20,000 Hammer et al. (2000) 
K M9 40,000 Genographic# 
L M20 30,000 India Genographic# 
La M27/M76 9100 Sengupta et al. (2006) 
16) M175 35,000 Orissa, North East, South East Asia, Genographic# 
02a M95 11,700 Orissa Sengupta et al. (2006) 
03 M122 10,000 North East, South East Asia, China Genographic# 
P M45 40,000 North Asia Wells et al. (2001) 
Q 242 15,000-18,000 Seielstad et al. (2003) 
207 30,000 Genographic# 
R2 M124 25,000 India Genographic# 
Rtat M17 15,000 Caucus, Europe, India Wells et al. (2001) 
*The estimated ages have all been determined based on the available data: if it was not a representative sampling, then the age may vary. 


Ascertainment bias is possible due to smaller samples and sampling errors; hence, some of these ages may need to be considered with 


caution. 


*www.nationalgeographic.com/genographic website. 
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TABLE 74-2 NRY HG ALLELE DISTRIBUTION DATA USED FOR COMPUTING 


PAN-ASIAN PCA 


Population 
Codes in 
Serial Country / Province / Language Pan-Asian Sample 
No Population Region State families PCA, Fig 2 size C-M130 D-M174 E*-M96 G-M201 I-M170_ F*-M89 
al Pathan Pakistan IE Pak7 21 4.762 (e) (0) 9.524 (e) 4.762 
2 Sindhi Pakistan IE Pak8& 21 te) (0) (0) (0) fe) 4.762 
3 Hazara Pakistan IE Pak4 25 40 0 0 0 4 0 
4 Kalash Pakistan IE Pak5 20 (0) e) (0) 20 (e) 0 
5 Makrani Pakistan IE Pak6 20 0 0 5 0 0 0 
6 Konka Brahmin India Wes Goa IE Goat 43 2.326 0 0 10} 0 4.651 
h Gujarat India Wes Gujarat IE Guj3 29 17.24 10) 10) 10) 0 3.448 
8 Gujarat Brahmin India Wes Gujarat IE Gujt 64 3.125 10) 3.125 10.94 0 (0) 
9 Bhils ndia Wes Gujarat IE Guj2 22 9.091 (0) (e) (e) (0) 18.18 
10 Desasath Brahmin | India Wes Maharashtra IE Mah2 16 6.25 0 0 0 0 0 
44 Kathari ndia Wes Maharashtra IE Mah3 19 ie) 0 0 0 0 26.32 
12 Maratha ndia Wes Maharashtra IE Mah5 36 5.556 0 0 10) 0 5.556 
13 Punjab ndia West | Punjab E Pun2 66 3.03 0 10) 10) 0 1.515 
14 Punjab Brahmin ndia North | Punjab E Punt 49 4.082 0 10) 4.082 0 4.082 
15 Kashmir Gujars ndia North | Jammu Kashmir | IE J&K1A 49 2.041 0 10) 10) 0 4.082 
16 Kashmiri Pandits ndia North | Jammu Kashmir | IE J&K2 51 1.961 ie} 0 1.961 0 3.922 
17 Rajput ndia North | Rajasthan E Rajt 29 3.448 0 10) 10) 0 10.34 
18 Uttar Pradesh ndia North | Uttar Pradesh E Uprd 31 10) 0 10) 10) 0 (0) 
Brahmin 
19 Bihar Brahmins ndia Bihar E Biht 56 1.786 (0) (0) (0) (0) (0) 
Central 
20 Madhya Pradesh ndia Madhya E MP1 42 (0) (0) (0) 0 10) 2.381 
Brahmins Central Pradesh 
21 Halba ndia Maharashtra E Mah4 21 0 0 0 0 0 23.81 
Central 
22 Karan ndia Orissa E Oria 18 0 0 0 0 0 0 
Central 
23 Oriya Brahmin ndia Orissa E ori2 24 0 10) 0 0 10) 4.167 
Central 
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TABLE 74-2 (CONTINUED) 


P-M74/ Q*-M242/ 
Hi*-M52 J2-M172 L-M20/M1i1 K-M9 N-M231 O*-M175 0O2a-M95 03-M122 M45 P36 R*-M207 R1iai-M17 R2-Mi24_ Reference 
9.524 (0) 9.524 fe) (0) (e) (0) (0) (0) 9.524 14.29 38.1 (0) Sengupta 
et al., 2006 
(0) 28.57 4.762 (0) fe) fe) 0) oO (0) 4.762 (0) 52.38 4.762 Sengupta 
et al., 2006 
(0) 4 (0) (0) (e) 0) (0) 8 (0) 8 32 (e) 4 Sengupta 
et al., 2006 
20 10 25 (0) fe) (0) (e) (0) te) (0) 5 20 (e) Sengupta 
et al., 2006 
(0) 25 20 (0) (0) (0) (e) (0) (e) 5 10 25 10 Sengupta 
et al., 2006 
6.977 13.95 18.6 2.326 |0 (e) fe) (0) (0) (0) (0) 41.86 9.302 Kivisild et al., 
2003 
10.34 20.69 10.34 3.448 |0 (0) (0) te) 6.897 (0) te) 24.14 3.448 Kivisild et al., 
2003 
1.563 15.63 7.813 3.125 | 3.125 (0) (0) (e) fe) fe) 9.375 32.81 9.375 Sharma et al., 
2009 
9.091 18.18 18.18 fe) (0) (0) (0) (0) (e) (e) (0) 9.091 18.18 Sharma et al., 
2009 
18.75 12.5 12.5 (0) oO (0) fe) (0) (0) 0) (0) 43.75 6.25 Sahoo et al., 
2006 
36.84 5.263 5.263 (0) (e) (0) (e) (0) 5.263 (0) 5.263 15.79 (0) Sahoo et al., 
2006 
30.56 19.44 14.4114 te) (0) 0) te) (e) (0) te) fe) 13.89 13.89 Sengupta 
et al., 2006, 
Sahoo et al., 
2006 
3.03 21.21 12:12 (0) (0) te) te) te) 7.576 fe) fe) 46.97 4.545 Kivisild et al., 
2003 
fe) 22.45 6.122 (e) te) te) (e) fe) (0) (e) (0) 34.69 24.49 Sharma et al., 
2009 
10.2 6.122 16.33 8.163 |0 (e) (e) (0) (0) 2.041 2.041 40.82 8.163 Sharma et al., 
2009 
9.804 9.804 5.882 9.804 |0 (0) (0) (e) (0) 5.882 17.65 19.61 13.73 Sharma et al., 
2009 
17.24 13.79 6.897 (0) (e) fe) (0) 3.448 (0) (0) (0) 31.03 13.79 Sengupta 
et al., 2006 
16.13 3.226 3.226 (0) fe) (0) (0) (0) 0) 6.452 (0) 67.74 3.226 Sharma et al., 
2009 
(0) 8.929 8.929 3.571 |0 (e) (0) (0) (0) 3.571 3.571 64.29 5.357 Sahoo et al., 
2006, Sharma 
et al 2009 
7.143 23.81 7.143 (0) 2.381 (0) (0) (e) 2.381 |4.762 (0) 38.1 11.9 Sharma et al., 
2009 
23.81 (e) (0) (0) (0) (0) 28.57 (e) (e) 4.762 (0) 19.05 (0) Sengupta 
et al., 2006 
16.67 5.556 (e) (0) oO (0) (e) (0) (0) (0) (0) 55.56 22.22 Sahoo et al., 
2006 
8.333 4.167 20.83 (0) (e) (e) (0) (0) 4.167 (e) 4.167 41.67 12:5 Sahoo et al., 
2006 
(Continued) 
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TABLE 74-2 (CONTINUED) 


Population 
Codes in 

Serial Country / Province / Language Pan-Asian Sample 

No Population Region State families PCA, Fig 2 size C-M130 D-M174 E*-M96 G-M201 I-M170_ F*-M89 

24 Lambadi India South | Andhra Pradesh | IE And9Q 53 11.32 0 0 0 0 3.774 

25 W.Bengal India East | West Bengal IE WB1 34 3.226 (e) (0) 3.226 (0) 6.452 

26 Karmali India East West Bengal IE WB4 16 0 0 0 0 10) 0 

27 Kora India East West Bengal IE WB5 17 (0) 0 10) 10) 0 17.65 

28 WB. Brahmin India East West Bengal IE WB7 49 (0) 10) 0 (0) 10) 0 

29 Garo ndia East Meghalaya TB NE2 33 0 (0) 10) 0 (0) 18.18 

30 Jamatia ndia East West Bengal TB NE3 30 (0) 1@) 0 0 0 0 

31 Korku ndia West | Maharashtra AA Mah1 59 0 0 0 0 10) 15.25 

32 Asur ndia Jharkand AA Jha 55 10) 10} 0 0 0 25.45 
Centra 

33 Birjia ndia Jharkand AA Jha2 24 ie) (0) 0 te) fe) (e) 
Centra 

34 Korwa India Jharkand AA Jha3 42 0 ie) 0 0 0 33.33 
Centra 

35 Savar India Jharkand AA Jha4 47 0 0 0 0 0 40.43 
Centra 

36 Kharia India Jharkand AA Jhad5 46 2.174 (0) fe) (e) (0) 39.13 
Centra 

37. Munda ndia Jharkand AA Jha6 60 0 0 0 0 0 23.33 
Centra 

38 Juang ndia Orissa AA ori3 59 0 (0) 0 10) 0 1.695 
Centra 

39 Ho ndia Orissa AA orid 116 (0) 0) (e) (e) (0) 22.41 
Centra 

40 Mahali India West Bengal AA WB3 38 fe) (0) (0) (0) (0) 39.47 
Central 

41 Khasi India East Meghalaya AA NEL 92 (0) (0) (e) ce) (0) 17.39 

42 Mudi India East | West Bengal AA WB2 37 (0) (0) (0) (0) (e) 45.95 

43 Lodha India East | West Bengal AA WB6 71 1.408 (e) (e) (e) (0) 14.08 
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TABLE 74-2 (CONTINUED) 


P-M74/ Q*-M242/ 
H1*-M52 J2-M172 L-M20/M11 K-M9 N-M231 0*-M175 02a-M95 03-Mi22 M45 P36 R*-M207 Ria1-M17 R2-M124_ Reference 


5.66 3.774 11.32 3.774 | 0 (o) fe) (0) 33.96 |0 11.32 13.21 1.887 Sahoo et al., 
2006, Kivisild 
et al., 2003 


9.677 6.452 ie) 0) (0) 3.226 0) 0) 6.452 0) 0) 38.71 22.58 Kivisild et al., 
2003 


0) 0 (0) 0) 0) 0) (0) 0) (0) 0) 0) 0) 100 Sahoo et al., 
2006 


47.06 ie) 0) 0) 0) 0) 29.41 ie) ie) 0) 0) 5.882 0) Sahoo et al., 
2006 


6.122 fe) (e) (0) (0) (0) 0) (0) (e) (0) (e) 71.43 22.45 Sharma 

et al., 2009, 
Sengupta 

et al., 2006 


0) (0) ie) 0) 0) 3.03 18.18 54.55 0) (0) 6.061 0) (0) Kumar et al., 
2007 


3.333 (0) (e) (e) (0) (e) 6.667 76.67 (0) fe) O 6.667 6.667 Sengupta 
et al., 2006 


0) ie) 0) ie) 0) (0) 81.36 1.695 1.695 0) 0) 0 0) Kumar et al., 
2007 


(0) 9.091 0) 0) 0) 0) 63.64 0) 0) 0) 1.818 ie) 0) Kumar et al., 
2007 


0) 0) 0) 0) ie) 0) 95.83 (0) 4.167 0) 0) oO ie) Kumar et al. 
2007 


0) 2.381 0) 0) 0) 0) 59.52 0) 0) 0) ie) 0) 4.762 Kumar et al. 
2007 


0 12007 0) (0) 0) 0) 14.89 ie) 0) 0) 31.91 (0) 0) Kumar et al. 
2007 


2.174 2.174 (0) (0) (0) 2.174 45.65 te) (0) (0) 6.522 (e) (e) Kumar et al., 
2007, Sahoo 
et al., 2006 


0 (0) (0) 1.667 |0 (e) 50 (0) 6.667 (0) 11.67 (0) 6.667 Kumar et al., 
2007, Sahoo 
et al., 2006 


(e) (0) oO (0) (0) (0) 98.31 (0) (e) (e) (0) (0) 0 Sahoo et al., 
2006, Kumar 
et al., 2007 


(0) 0.862 (e) (0) (0) (e) 71.55 (0) 2.586 |0 0) fe) 2.586 Sengupta 
etal., 2006, 
Kumar et al., 
2007, Sahoo 
et al., 2006 


13.16 5.263 (0) 5.263 |0 (0) 7.895 0) (0) (0) 15.79 (0) 13.16 Kumar et al., 
2007, Sahoo 
et al., 2006 


0) 0) (0) ie) 0) 2.174 41.3 29.35 4.348 |0 5.435 0 0) Kumar et al. 
2007 


ie) 2.703 (0) 0) 0) 0) 43.24 0) 2.703 0) 2.703 0) 2.703 Kumar et al. 
2007 


5.634 30.99 (0) 2.817 |0 te) 5.634 (e) (e) (e) (0) 1.408 38.03 Sengupta 

et al., 2006, 
Kumar et al., 
2007 


(Continued) 
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TABLE 74-2 (CONTINUED) 


Population 
Codes in 

Serial Country / Province / Language Pan-Asian Sample 
No Population Region State families PCA, Fig2 size C-M130 D-M174 E*-M96 G-M201 I-M170_ F*-M89 
44 Oraon India Jharkand DR Jha7 100 0 0 0 0 0 54 

Central 
45 Muria India Orissa DR ori4 20 0 0 0 0 0 10 

Central 
46 Koraga India South | Andhra Pradesh | DR Andi 33 10) 6.061 0 0 0 10) 
47 Koya India South | Andhra Pradesh | DR And2 41 10) (0) 0 0 0 36.59 
48 Yerava India South | Andhra Pradesh | DR And3 41 26.83 0 0 0 0 43.9 
49 Kappu naidu India South | Andhra Pradesh | DR And4 18 10) (0) 0 5.556 0 10) 
50 Komati ndia South | Andhra Pradesh | DR And5 20 0 0 0 5 0 10 
51 Naikpod Gond ndia South | Andhra Pradesh | DR And6 18 22.22 (0) 10) 0 0 11.11 
52 Raju ndia South | Andhra Pradesh | DR And7 19 10) 0 10) 0 0 0 
53 Yerkual ndia South | Andhra Pradesh | DR Ands& 18 0 0 0 0 0 10) 
54 Konda Reddy ndia South | Andhra Pradesh | DR And1i0 30 10) (0) 0 0 0 23.33 
55 Koya Dora ndia South | Andhra Pradesh | DR Andit 27 0 (0) 0 0 0 25.93 
56 Andh ndia South | Andhra Pradesh | DR Andi2 54 1.852 0 0 0 ie} 3.704 
57 lyer ndia South | Tamil Nadu DR TN1 29 6.897 fe) (e) 10.34 (0) 3.448 
58 Kurumba ndia South | Tamil Nadu DR TN2 19 0 0 0 0 0 15.79 
59 lyengar ndia South | Tamil Nadu DR TN3 47 10) (0) (0) 8.511 0 0 
60 lrula India South | Tamil Nadu DR TN4 40 5 (0) (e) (e) (0) 42.5 
61 Pallan India South | Tamil Nadu DR TNS 44 2.273 0 (0) (0) 0) 6.818 
62 Kallar India South | Tamil Nadu DR TN6 93 6.452 (0) (e) (0) oO 16.13 
63 Sinhalese SriLanka SriLanka DR SL1 39 0 0 0 0 0 12.82 
64 Burushaki Pakistan unclassified | Pak3 20 5 10) 0 5 0 0 


Abbreviations of language families: IE=Indo European, AF=Afro-Asiatic, DR=Dravidian, TB=Tibeto-Burman, ST=Sino-Tibetan, AA=Austro-Asiatic, AT=Altaic, BR=Brusaki. 
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P-M74/ Q*-M242/ 
Hi*-M52 J2-M172 L-M20/M1i1 K-M9 N-M231 O*-M175 0O2a-M95 0O3-M122 M45 P36 R*-M207 Riai-M17 R2-M1i24_ Reference 
3 (0) (0) (0) (e) (0) 35 (e) 1 (0) 4 (e) 3 Kumar et al., 
2007, Sahoo 
et al., 2006 
80 0 (0) (0) (0) (0) 10 (0) te) 0) (e) (0) 0) Sengupta 
et al., 2006 
87.88 (e) (0) fe) (0) (0) (0) (e) (e) (0) (0) (0) 6.061 Cordaux et al., 
2004a 
60.98 (0) oO fe) (0) (e) (e) (e) fe) (e) (0) 2.439 (e) Kivisild et al., 
2003 
19.51 (0) (e) (0) (e) (e) (0) (0) (0) e) 0 9.756 (e) Cordaux 
et al.,2004a 
(0) te) (e) (0) (0) (0) 0 (e) te) fe) 11.11 11.11 72.22 Sahoo et al., 
2006 
(e) (e) 0) (e) (0) (0) (e) (0) (e) (e) (0) 15 70 Sahoo et al., 
2006 
61.11 (0) 5.556 (0) te) (e) (e) (0) (0) (0) (0) (e) fe) Sahoo et al., 
2006 
(0) 10.53 21.05 15.79 |0 0) (0) te) (0) 0) 15.79 26.32 10.53 Sahoo et al., 
2006 
0) (0) 11.11 55.56 |0 fe) (0) te) (0) (0) te) 33.33 fe) Sahoo et al., 
2006 
3.333 (e) (0) (0) (0) (0) 66.67 te) te) (0) (e) 6.667 (0) Sengupta 
et al., 2006 
22.22 3.704 (0) (0) (0) 0 48.15 (e) (e) (0) 0) (0) (0) Sengupta 
et al., 2006 
16.67 35.19 1.852 fe) fe) fe) 1.852 1.852 (e) (e) (0) 31.48 5.556 Thanseem 
et al., 2006 
3.448 17.24 17.24 (e) (0) (e) (e) (0) fe) (e) 3.448 27.59 10.34 Sengupta 
et al., 2006 
68.42 (0) 5.263 fe) (e) fe) (0) (0) (0) fe) (0) 0) 10.53 Sengupta 
et al., 2006 
23.4 19.15 19.15 (0) fe) (0) te) (e) 0 (0) fe) 23.4 6.383 Sengupta 
et al., 2006, 
Sahoo et al., 
2006 
35 2.5 7.5 (0) (0) (0) (0) (0) te) (e) fe) oO 7.5 Sengupta 
et al., 2006, 
Sahoo et al., 
2006 
29.55 9.091 11.36 9.091 |0 ce) (0) (e) (0) oO 2.273 15.91 13.64 Sengupta 
et al., 2006, 
Sahoo et al., 
2006 
18.28 1.075 44.09 (e) te) (0) 1.075 (0) 1.075 fe) (0) 3.226 8.602 Wells et al., 
2001, Sahoo 
et al., 2006 
7.692 10.26 17.95 (0) te) (e) (0) (0) (0) fe) te) 12.82 38.46 Kivisild et al., 
2003 
15 5 15 5 (e) (0) oO 5 (0) (0) 30 (0) 15 Sengupta 
et al., 2006 
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NRY HG H1-M52, a derivative of M69, has been 
reported in higher frequencies in southern India, cut- 
ting across the caste and tribal boundaries (Wells 
et al., 2001; Kivisild et al., 2003). A few other studies 
have found HG H1-M52 with high STR variance in 
Maharashtra (Sengupta et al., 2006) and western India 
(Trivedi et al., 2008). Thanseem et al. (2006) have sug- 
gested that M52 originated in the Indian subcontinent 
immediately after the Late Pleistocene settlements. The 
available samples in literature show an estimated age of 
25,000 years (Table 74-1). 


NEOLITHIC CATTLE KEEPERS 


The J-M172 clade implicated in agricultural expansion 
through Neolithic cattle keepers is thought to have arisen 
in the Caucusus and Anatolia and spread to southwest- 
ern Europe (Cavalli-Sforza et al., 1994; Semino et al., 
2004; Hammer et al., 1998; Rosser et al., 2000); Bedouin 
and Palestinian Arabs possess the highest frequency of 
this mutation (66%—55%) followed by Sephardic Jews 
and Muslim Kurds (40%; see Semino et al., 2004). The 
J HG is divided into two sub-haplogroups, J1-M267 and 
J2-M172, with the former showing an ancestral Y-STR 
haplotype 14-16-23-11-12, for loci DYS19-DYS388- 
DYS390-DYS392-DYS393 and the latter 14-15-23-11-12 
(Giacomo et al., 2004). A one-step mutated haplotype 
of this J2, viz. 15-15-23-11-12, is the common clade in 
India: the Thodas of Nilgiris possess this J2 in higher fre- 
quencies, correlating with their pastoral buffalo cult life 
(Kavitha, 2008). It has been proposed that earlier migra- 
tions brought agriculture and Dravidian speakers into 
India, while another, much later one brought rice cultiva- 
tors from Southeast Asia (Diamond and Bellwood, 2003; 
Fuller, 2003). HG J2-M172 has shown a high STR diversity 
in Dravidian tribal populations, but to hypothesize that 
this HGJ2-M172 is a part of a Neolithic expansion would 
require more evidence (Thanseem et al., 2006). The avail- 
able datasets thus do not correlate well with these two 
major events. The archaeology once again does not sup- 
port this contention. 

Some studies have found HG J2-M172 at higher fre- 
quencies in Dravidian and Indo-European castes than 
in tribes (Sengupta et al., 2006, Cordaux et al., 2004a). 
It is absent in East Asia, and typically present in Central 
Asia at frequencies of 10%-20%, leading Cordaux et al. 
to interpret that Indian HG J2-M172 originated from 
Central Asia rather than West Asia. The data listed in 
Table 74-2 shows the presence of J2 in a wide variety 
of populations across India. Neolithic markers of early 
farmers—HGs E3-M35 and G-M201, that are prevalent 
in Europe, Anatolia, the southern Caucusus, and Iran— 
are, however, sporadic in Indians (Semino et al., 2000; 
Underhill et al., 2001). 
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CENTRAL ASIAN EXPANSION 


An expansion of HG F-M89 lineages toward Central Asia 
or the Caucusus also gave rise to a founder that acquired 
the HG K-M9 mutation, defining another major bifur- 
cation in the phylogeny. Distinctive HG K-M9 sublin- 
eages have been observed in India, the Middle East, and 
Europe, while some HG K-M9 and HG M-M 186 lineages 
are restricted to Oceania. Three major lineages of K-M9, 
HG P-M45 are characteristic of North Asia, while HG 
Q-M242 is found in Siberia and North America, and the 
westward-expanding HG R-M207 in Eurasia. K-M9 has 
given rise to two offshoots, one HG L-M20 prevalent in 
the Indian subcontinent to become L1-M76 in southern 
India, and another HG O-M175 found in eastern Asia, the 
whole of oriental populations including the Chinese, and 
also in the Austro-Asiatic speakers and Tibeto-Burmese 
speakers of India. The genomic evidence further supports 
this (HUGO Pan-Asian SNP Consortium, 2009). 


ORIGIN OF L AND DRAVIDIAN SPEAKERS 


The NRY HG L-M20 is virtually absent in Europe, but 
found irregularly and at low frequencies in populations 
of the Middle East and southern Caucusus (Nebel et al., 
2001). It occurs at a frequency of 4.3% in Pakistan and 
13.5% in Central Asia (Qamar et al., 2002; Semino et al., 
2000; Wells et al., 2001). Ata resolution of six STR loci, four 
Chenchu tribal individuals from Andhra Pradesh shared 
a widespread common haplotype 14-12-22-10-14-1]; 
DYS19- DYS388- DYS390-DYS391- DYS392- DYS393. 
This is shared by Lambadis, Punjabis, and Iranians. An 
Armenian haplotype 15-12-23-10-13-11, commonly found 
in their HG L-M20, is a three-step mutation (Weale et al., 
2001). These differences indicate two distinct founders 
and independent expansions: more data is required to 
identify the antiquity of these populations. The hitherto 
available L subtyping data shows the presence of HG 
L1-M76 in many northwestern states and Dravidian- 
speaking southern belts of India (Trivedi et al., 2008). 
Sengupta et al. (2006) found a subtype of HG L-M20 to 
be the most common haplogroup in India, and proposed 
its early diversification in Dravidian speakers and subse- 
quent expansion toward peripheral regions, suggesting 
an Indian origin of Dravidian speakers. The Brahmin 
populations from Tamil Nadu have been considered as 
Dravidian speakers, to prove their argument. However, 
Sahoo et al. (2006) observed absence of HG L-M20 in IE 
speakers from Bihar, Orissa, and West Bengal, and has 
concluded that distribution of NRY HGs in India was 
associated with geography rather than linguistics. Among 
Austro-Asiatic (AA) speakers of India, as mentioned ear- 
lier, HG O2-M95 is predominantly a Southeast Asian 
marker (Basu et al., 2003) and virtually absent in central 


GENOMICS IN MEDICINE AND HEALTH—INDIAN SUBCONTINENT 


Asia (Wells et al., 2001). Thanseem et al. (2006) have found 
HG O2-M95 at the highest frequency in AA tribes (52%), 
and a deeper coalescence age (68,000 YBP) that does not 
fit with the history of other NRY clades. Non-AA castes 
and tribes have a frequency of this marker of 6.3%, and 
the scenario has suggested a footprint of earlier AA set- 
tlers carrying this defining mutation. HG O3-M122, and 
its sublineage HG O3e-M134 that spread through East 
Asia (Suet al., 2000) showed the highest frequency among 
Tibeto-Burman (TB) speakers of North East India , while 
the caste groups of the region possess only 3% (Trivedi 
et al., 2006). Further, since the coalescence age for HGs 
C-M130, H-M69, and R2-M124 was deeper compared to 
HG O-M175, they concluded that AA speakers could not 
have been the earliest settlers of India. More recent data 
in Table 74-1, however, suggest an estimated age of 35,000 
years for HGO-M175 and 11,700 years for HG O2a-M95. 


R2 RESTRICTED TO INDIA AND ITS NEIGHBORS 


HG R2-M124, a last major clade of significance to appear 
in India, is restricted to India, Pakistan, Iran, and southern 
Central Asia (Kivisild et al., 2003); however, this has been 
seen with highest frequency (53%) among Sinte Romani 
(Gypsy) (Wells et al., 2001). Cordaux et al. (2004a) have 
suggested that this HG R2 originated in India; this con- 
clusion was based on the presence of this clade in both 
Dravidian and Indo-European speakers. Within India 
it is predominant in the east coast and southern India 
(Sahoo et al., 2006). Network analysis of available data has 
depicted that a large number of haplotypes were shared 
between populations of South India, while the popula- 
tions of eastern India harbored more discrete haplotypes, 
originating in situ. 


THE ENIGMA OF R1iA1 


Contrary to R2, the widespread northern Indian clade 
among Brahmin-related groups, HG Rlal-M1/7, has been 
linked with the recent spread of Kurgan culture origi- 
nating in southern Russia/ Ukraine and dispersing to 
Europe, Central Asia, and India between 3000-1000 BCE 
(Passarino et al., 2001; Quintana-Murci et al., 2001; Wells 
et al., 2001). In a global analysis, a deeper Palaeolithic 
time depth of ~15,000 YBP for HG Rlal-M17 mutation 
has been suggested (Semino et al., 2000; Wells et al., 2001). 
Further, two region-specific Y-STR allele patterns have 
been associated with HG Rlal-M17 among Europeans 
(Passarino et al., 2002): allele 15 at DYS19 and alleles 19 
and 21 at locus YCA IIa,b against the background of HG 
Rlal-M17 characterize populations of Western Europe, 
while alleles 16 for DYS19 and 19,23 for YCA IIa,b charac- 
terize Eastern European populations. 
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Interestingly, the high frequency of HG Rlal-M17 is 
concentrated around the elevated terrain of central and 
western Asia, and is present at a relatively low frequency 
in Caucusus and Middle East. In Central Asia, its fre- 
quency is highest in the highlands among Tajiks, Kyrgyz, 
and Altais (>50%) and drops down to <10% in the plains 
among the Turkmenians and Kazakhs (Wells et al., 2001; 
Zerjal et al., 2002). In contrast to the above, other stud- 
ies have observed a high HG frequency in Central Asians 
and lower average STR diversity than in Indian castes and 
tribes. This has been attributed to a founder effect from 
southern and western Asia during the early Holocene 
expansion, contributing HG Rlal-M17 chromosomes to 
both Central Asian and South Asian tribes prior to the 
arrival of the Indo-European speakers (Kivisild et al., 
2003; Thanseem et al., 2006; Trivedi et al., 2008). Zerjal 
et al. (2002), however, attributed the low Y- STR diversity 
to a bottleneck effect in Central Asian populations. Some 
authors also propose an Indian origin for the HG Rlal- 
M17 based on the high frequency and associated STR 
variance in India (Sharma et al., 2009), while others attri- 
bute the origin to C.Asia (Wells et al., 2001). While exten- 
sive subclades and subtypes have been identified for NRY 
HG RIb, the Rlal has been the least studied (Underhill 
et al., 2010) for lack of new markers; we await more data 
from our genographic project (www.nationalgeographic. 
com/genographic), in order to further decipher the early 
population-movements scenario. 


LANGUAGE CORRELATES OF NRY 
DISTRIBUTION—A PRINCIPAL COMPONENT 
ANALYSIS 


The overall NRY diaspora based on the hitherto avail- 
able data suggests a pattern of peopling of India. While 
HG Rlal is prevalent in the northern Indian belt, the 
HG O and its derivatives are predominantly seen in east- 
central and northeastern regions of India, mostly among 
tribals. HG L is restricted to various Dravidian speaking 
populations of India and some populations of Pakistan 
(Figure 74-2). The data in Table 74-2 and Figure 74-1, 
showing the stacked areas of various alleles in different 
populations, reveals a striking correlation between NRY 
composition and the languages they speak. This is further 
brought out by the principal component analysis (PCA) 
(Figure 74-2) of the data in Table 74-2. The first two com- 
ponents of the PCA account for more than 50% of the 
total variance (Figure 74-3). 

The geographical distribution of the NRY HGs 
described in the previous paragraph is clearly brought out 
in the PCA plot of various NRY HG frequency distribution 
in these populations The three language speakers, i.e., Indo- 
European (IE), Dravidian (DR) and Austro-Asiatics (AA), 
irrespective of tribes or castes, are seen to be influenced by 
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Figure 74-1 Picture of 100% stacked area of NRY HG allelic composition of 58 Indian and 5 Pakistani popula- 
tions (totaling 2447 samples) arranged according to the language they speak at present. A clear trend of 
various alleles in different language speakers was discernible. Note the relative distributions of R1a1-M17 
(yellow color), O2a-M95 (violet), H1-M52 (green); L-M20 (pink) F*-M89 F* (red) and C-M130 (black) in 
various language speakers. P = Pakistan; IE = Indo-European speakers; TB = Tibeto-Burmese speakers; 
AA = Austro-Asiatic speakers; and DR = Dravidian speakers, all from India. For exact population caste/ 


tribe names and their references, refer to Table 74-1. Refer color figure. 


various eigenvectors: thus the Dravidian speakers, mostly 
tribals, are distributed in upper right quadrangle, while 
the Orissa, West Bengal, and northeastern tribal popula- 
tions speaking AA languages cluster on the right bottom 
quadrangle of the plot. Many Brahmin and other popula- 
tions of northern India, speaking IE languages, are clus- 
tered on the left bottom quadrangle of the plot. The overlap 
between IE speakers and DR speakers seen in the middle 
of the plot can be attributed to either a confluence of two 
ancestors, miscegenation, or founders to varying degrees, 


or to language replacement. The populations found at the 
extremes, with highest Eigen in one direction, possessed 
the highest frequencies of one or another NRY allele. This 
can be attributed to a small founder or bottleneck effect, 
and uninterrupted expansion without any foreign gene 
flow. The terrain and climate of the eastern central India 
and northeastern India favors such a population expan- 
sion. However, the absence or low frequencies of many 
other NRY clades in the AA speakers, and the concentra- 
tion of these tribal populations in huge numbers in the 
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Figure 74-3 The Scree Plot of the principal component analysis of NRY HG data of Indian populations 


available in literature. 


eastern central India/Orissa belt and the northeastern hilly 
tracts of India suggests a concomitant origin of these NRY 
clades and their language, and a spurt of huge expansion 
from the small founder. This is reiterated by the observa- 
tion that all these populations possessed very little of other 
parallel and later-derived NRY HGs. 

This proposition is further supported by the data that 
the Chinese and other oriental populations with the deriv- 
atives of O3 have are decedents from a common ances- 
tor somewhere from the northeast/Myanmar region (Shi 
et al., 2005). Basu et al. (2003) have suggested that the AA 
and TB speaking tribal groups might have entered India 
first from a northwest corridor and, much later, some 
through a northeast corridor. Contrary to this, Cordaux 
et al. (2004b) have proposed that northeast India acted as 
a barrier. Kumar et al. (2007), in an extensive recent study, 
identified a strong genetic link among sublinguistic groups 
of Indian AA-speaking populations and has suggested an 
origin of AAs in India who later spread to Southeast Asia. 
The analysis of the present study, in light of the population 
size and the extent of distribution of these AA-speaking 
tribal groups, reiterates the concomitant origin of these 
clades and their language. The time of origin of the clade 
O2a, ie., 11,700 years ago (Sengupta et al., 2006; Table 
74-1) fits well with the assumptions that spoken language 
originated ~10,000 years ago. A very interesting observa- 
tion was the higher frequencies (50%-90%) of O2a in half 
of the AA-speaking populations hitherto available in lit- 
erature (Table 74-2). 


CONCLUSION 


India, the second continent to be successfully occupied 
by modern man, is heterogeneous in itself; in terms of 
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geography, climate, and populations. The whole of India 
thus cannot be considered as a single gene pool. The 
migratory history as revealed by NRY shows definitive 
pathways, origin, and autochthonous expansion of vari- 
ous NRY clades and populations in different parts of the 
country. Many of these populations are ancient than the 
languages they speak. Thus, as various languages devel- 
oped, presumably in small founders, the population 
expansion and language spread must have taken place 
concomitantly: hence, we see a good correlation between 
languages and NRY in India. 


SUMMARY 


Modern man (Homo sapiens sapiens), originating in 
Africa, first emigrated ~70,000 years ago, walked through 
the coasts of India (southern coastal route model) and 
reached Australia. Since then many migrations, settle- 
ments, and expansions have taken place in various parts 
of India. The island model of human settlements and 
expansions may explain the origin of settled communi- 
ties and languages in India.NRY chromosome markers 
help to unravel the details of early migrations of man into 
South Asia. The analysis of the literature thus suggests 
the origin and expansion of languages, superimposed 
by the genomic data: the data implies small founders, 
autochthonous origin (mutation) of new NRY markers, 
nuclear origins, and uninterrupted expansion / dispersal 
of populations and languages in India as exemplified by 
Austro-Asiatic (AA) speakers. The better communica- 
tion means and the language presumably led to the settle- 
ments (founders), rapid expansion, formation of culture 
and societies, and their dispersal to newer horizons and 
territories. South Asia and Southeast Asia thus seem to 
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be the cradle of many new founders, and autochthonous 
civilizations commensurate with languages that are pre- 
served till today. 
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